Zero-shot learning has received increasing interest as a means to alleviatethe often prohibitive expense of annotating training data for large scalerecognition problems. These methods have achieved great success via learningintermediate semantic representations in the form of attributes and morerecently, semantic word vectors. However, they have thus far been constrainedto the single-label case, in contrast to the growing popularity and importanceof more realistic multi-label data. In this paper, for the first time, weinvestigate and formalise a general framework for multi-label zero-shotlearning, addressing the unique challenge therein: how to exploit multi-labelcorrelation at test time with no training data for those classes? Inparticular, we propose (1) a multi-output deep regression model to project animage into a semantic word space, which explicitly exploits the correlations inthe intermediate semantic layer of word vectors; (2) a novel zero-shot learningalgorithm for multi-label data that exploits the unique compositionalityproperty of semantic word vector representations; and (3) a transductivelearning strategy to enable the regression model learned from seen classes togeneralise well to unseen classes. Our zero-shot learning experiments on anumber of standard multi-label datasets demonstrate that our method outperformsa variety of baselines.
展开▼